-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance of DepthwiseConv(NHWC) #31677
improve performance of DepthwiseConv(NHWC) #31677
Conversation
Thanks for your contribution! |
17e68c2
to
19dbc23
Compare
db36517
to
95b5d89
Compare
a4db1c2
to
454a0f8
Compare
acbfa82
to
14d16a7
Compare
14d16a7
to
50b5508
Compare
Test the above cases on V100:
|
@@ -142,13 +141,14 @@ __device__ __inline__ void KernelDepthwiseConvNHWC( | |||
for (int w_in = w_in_start; w_in < w_in_end; w_in += dilate_width) { | |||
if (h_in >= h_start && h_in < h_end && w_in >= w_start && w_in < w_end) { | |||
int offset = ((batch * input_height + h_in) * input_width + w_in) * | |||
output_channels + | |||
input_channels + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code here seems to cause an error when input_channels is not equal to the output_channels. We will add a case in unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be input_channels
here.
} else { | ||
value += weight[weight_offset] * in_data; | ||
value += weight[0] * in_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe why this change was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve gld_efficiency, filter_data was transposed from CHW
to HWC
in this PR. So weight in (h_f, w_f, c_out)
should be const T* weight = filter_data + weight_offset * output_channels + c_out
, in which weight_offset
equals h_f * filter_width + w_f
.
50b5508
to
f1bca11
Compare
f1bca11
to
29bb5a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimizationPR changes
OPsDescribe
improve performance of DepthwiseConv(NHWC)Forward of DepthwiseConv(NHWC)
Backward of DepthwiseConv(NHWC)